-
Notifications
You must be signed in to change notification settings - Fork 136
fix: Preserve pandas column name attribute #2363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
narwhals/_dask/dataframe.py
Outdated
@@ -110,7 +110,7 @@ def collect( | |||
from narwhals._pandas_like.dataframe import PandasLikeDataFrame | |||
|
|||
return PandasLikeDataFrame( | |||
result, | |||
result.rename_axis(columns=self.native.columns.name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be one step ahead of dask#11874
backend_version=parse_version(pd), | ||
implementation=Implementation.PANDAS, | ||
backend_version=parse_version(pd), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mental sanity for symmetry with other pandas-like and order of the spec π
|
||
df = nw.from_native(df_native) | ||
|
||
result = df.with_columns(b=nw.col("a") + 1, c=nw.col("a") * 2).select("c", "b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to concatenate two methods here, mostly for the sake of it.
Might be worth reparametrizing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dask select
and with_columns
have no issues since we use .assign
method for both!
thanks! i'll do a little test to check there's no accidental / surprise overhead |
@MarcoGorelli I tried a simple script to check import timeit
from pprint import pprint
import numpy as np
import pandas as pd
def benchmark_rename_axis_overhead(n_rows: int, n_cols: int, repetitions: int = 1000) -> float:
df = pd.DataFrame(np.random.randn(n_rows, n_cols))
return timeit.timeit(
lambda: df.rename_axis(columns="new_name", copy=False),
number=repetitions
) / repetitions
TOTAL_SIZE = 10_000_000
results = {
(int(TOTAL_SIZE/10**i), 10**i): benchmark_rename_axis_overhead(n_rows=int(TOTAL_SIZE/10**i), n_cols=10**i)
for i in range(1, 6)
}
pprint(results)
Let me know what your thoughts are and if we need to check something more extensively |
Single argument + fake typing π
|
thanks!
i'm still slightly concerned about there being extra copies in old versions, i'll take another look |
No rush! I keep merging main to avoid having to deal with a large conflicting diff if that happens to be the case |
What type of PR is this? (check all applicable)
Related issues
select
andwith_columns
don't always preserve column index name for pandas-likeΒ #1483Checklist